Keyword selection and processing strategy for applying text mining to patent analysis

نویسندگان

  • Heeyong Noh
  • Yeongran Jo
  • Sungjoo Lee
چکیده

Previous studies have applied various methodologies to analyze patent data for technology management, given the advances in data analysis techniques available. In particular, efforts have recently been made to use text-mining (i.e. extracting keywords from patent documents) for patent analysis purposes. The results of these studies may be affected by the keywords selected from the relevant documents – but, despite its importance, the existing literature has seldom explored strategies for selecting and processing keywords from patent documents. The purpose of this research is to fill this research gap by focusing on keyword strategies for applying text-mining to patent data. Specifically, four factors are addressed; (1) which element of the patent documents to adopt for keyword selection, (2) what keyword selection methods to use, (3) how many keywords to select, and (4) how to transform the keyword selection results into an analyzable data format. An experiment based on an orthogonal array of the four factors was designed in order to identify the best strategy, in which the four factors were evaluated and compared through k-means clustering and entropy values. The research findings are expected to offer useful guidelines for how to select and process keywords for patent analysis, and so further increase the reliability and validity of research using text-mining for patent analysis. 2015 Elsevier Ltd. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

How to Use Patent Information to Search Potential Technology Partners in Open Innovation

With the increasing trend towards collaborations for innovation across organizational boundaries, the strategic gravity of exploring potential technology partners has been accentuated in the paradigm of open innovation. However, as the openness across nations or industries has become broad, the conventional approaches to searching external partners have encountered a number of difficulties. The...

متن کامل

An approach to discovering new technology opportunities: Keyword-based patent map approach

This paper proposes an approach for creating and utilizing keyword-based patent maps for use in new technology creation activity. The proposed approach comprises the following sub-modules. First, text mining is used to transform patent documents into structured data to identify keyword vectors. Second, principal component analysis is employed to reduce the numbers of keyword vectors to make sui...

متن کامل

A Text Mining Technique Using Association Rules Extraction

This paper describes text mining technique for automatically extracting association rules from collections of textual documents. The technique called, Extracting Association Rules from Text (EART). It depends on keyword features for discover association rules amongst keywords labeling the documents. In this work, the EART system ignores the order in which the words occur, but instead focusing o...

متن کامل

Applying an integrated fuzzy gray MCDM approach: A case study on mineral processing plant site selection

The accurate selection of a processing plant site can result in decreasing total mining cost. This problem can be solved by multi-criteria decision-making (MCDM) methods. This research introduces a new approach by integrating fuzzy AHP and gray MCDM methods to solve all decision-making problems. The approach is applied in the case of a copper mine area. The critical criteria are considered adja...

متن کامل

Patent Keyword Extraction Algorithm Based on Distributed Representation for Patent Classification

Many text mining tasks such as text retrieval, text summarization, and text comparisons depend on the extraction of representative keywords from the main text. Most existing keyword extraction algorithms are based on discrete bag-of-words type of word representation of the text. In this paper, we propose a patent keyword extraction algorithm (PKEA) based on the distributed Skip-gram model for p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Expert Syst. Appl.

دوره 42  شماره 

صفحات  -

تاریخ انتشار 2015